Palaeoproteomics guidelines to identify proteinaceous binders in artworks following the study of a 15th-century painting by Sandro Botticelli’s workshop

Undertaking the conservation of artworks informed by the results of molecular analyses has gained growing importance over the last decades, and today it can take advantage of state-of-the-art analytical techniques, such as mass spectrometry-based proteomics. Protein-based binders are among the most common organic materials used in artworks, having been used in their production for centuries. However, the applications of proteomics to these materials are still limited. In this work, a palaeoproteomic workflow was successfully tested on paint reconstructions, and subsequently applied to micro-samples from a 15th-century panel painting, attributed to the workshop of Sandro Botticelli. This method allowed the confident identification of the protein-based binders and their biological origin, as well as the discrimination of the binder used in the ground and paint layers of the painting. These results show that the approach is accurate, highly sensitive, and broadly applicable in the cultural heritage field, due to the limited amount of starting material required. Accordingly, a set of guidelines are suggested, covering the main steps of the data analysis and interpretation of protein sequencing results, optimised for artworks.


.1. Materials and methods
Ten mock-ups (A-J) produced for a variety of experimental purposes over the past 80 years and now held in the reference archive at the National Gallery, London (see Table 1 in the main text) were studied to test the analytical protocol on selected paint systems. Mock-ups A-D were reconstructions prepared at the National Gallery in 2005 using egg yolk as a binder and some common historical pigments: lead white, malachite, chalk, and iron oxide, respectively. This allowed the analytical protocol to be tested with paints based on different metal-containing pigments. Mock-ups E and F were reconstructions prepared at the National Gallery in 1978 using rabbit skin glue as a binder, and lead white and smalt pigments, respectively, providing slightly older paint samples containing a different proteinaceous binder. Mock-ups G and H were part of the Fogg reference set, prepared at the Fogg Museum, Harvard in 1933. These reconstructions were composed of two layers: a preparation layer containing rabbit skin glue and gypsum, followed by a paint layer containing yellow ochre bound with egg yolk and oil, respectively. The samples in each case contained material from both layers, testing the ability of the protocol to find proteins from different sources in the same sample. Finally, mock-ups I and J were reconstructions prepared at the National Gallery in 2013 with paint containing a madder lake pigment bound in linseed oil and egg yolk, respectively. The preparation of the lake pigment involves the extraction of a red colourant from dyed sheep wool using alkali conditions at high temperature. It has been shown via infrared spectroscopy that the preparation of lake pigments in this way results in the co-precipitation of the wool protein with the pigment 1,2 , thus it was anticipated that evidence for the presence of sheep keratins might be observed in these samples. Since the protein of interest in this case is not associated with the binder, but with the pigment itself, and is therefore likely to be present in smaller amounts than the binder, samples from mock-ups I and J were slightly larger than the others to maximise the chance of observing sheep keratins.
The paint samples were removed using a scalpel and two sample sizes, "small" and "large", were collected and processed separately. It should be noted that the sizes of the samples in the two sets were estimated by eye. Paint samples are typically between 20 -300 µg in size (see for example 3 ). The small sample was estimated by eye to be at the lower end of this range, while the larger sample was approximately three times the amount of the smaller sample. In addition, the protein content in each sample will differ depending on the pigment/binder ratio of the paint. This ratio is not known since the paints were prepared following a traditional approach of adding the binder to the pigment in increasing amounts until the mixture reached the desired consistency. The smaller of the two samples was estimated to be close to the amount that would normally be sampled for lipid analysis by gas chromatography-mass spectrometry (GC-MS), while the larger of the samples was estimated to be an amount required for reliable results for amino acid analysis by GC-MS. Once confident protein identification was achieved using the larger of the samples, which also allowed checking for any adverse effects caused by the influence of the pigment on extraction and/or analysis, the capability of the protocol to identify proteins in the smaller samples was then assessed.
The samples from the mock-ups were processed following a protocol similar to the one described in the main text (Section 5.2 -5.3), with two main differences: the extracted peptides were eluted from StageTips using: 20 μL 40% ACN, 0.1% TFA in water, followed by 10 μL 60% ACN, 0.1% TFA in water into a 96-well MS plate. The two elution solutions were merged to form one unique sample for injection. The MS analysis was performed on a Q-Exactive HF (Thermo Scientific, Bremen, Germany), operated in the same conditions as the Q-Exactive HF-X (see main text, Section 2.4) but with a maximum MS/MS ion injection time set to 108 ms.
The data analysis for these samples consisted of two MaxQuant runs for each sample. The experimental MS/MS spectra were first matched against a reference database containing all the publicly available sequences for the most common proteinaceous paint binders (collagens, egg proteins, milk proteins). The second MaxQuant run matched the spectra against the SwissProt database (downloaded January 2017) 4 . For both samples of mockups I and J, two more searches were performed against a database containing keratins from Ovis aries, searching for tryptic peptides in the first search and for unspecific peptides in the second one. All the unspecified parameters were the same for all searches, as described in Section 5.4 of the main text.

Results and Discussion
Protein identifications The first objective of the analysis of the mock-ups was to verify if the experimental protocol allowed for the confident characterisation of protein residues extracted from a paint matrix. The source material and the taxonomic origin of the proteins was confidently identified in all cases except for the glue layer of the two small samples of mock-ups G and H (Supplementary Table S1). For all the other samples, the origin of egg yolk and animal glue was identified at species level due to the identification of species-specific peptides for several binder proteins. Proteins derived from the same material (e.g., egg or animal glue) in a sample likely all come from the same species, as long as there is no evidence to the contrary. Therefore, following the parsimony principle, proteins without species-specific peptides are assigned to the same species when all unspecific peptides are compatible with that species identification (in the mock-ups, Gallus gallus for egg proteins, Oryctolagus cuniculus for collagens, and Ovis aries for non-human keratins).
The identification of proteins not present in the original composition of the paint, i.e. egg proteins in mock-up H, might be due to a carryover of peptides from other samples during the chromatography. Carryover contamination was not observed for the other samples. In addition, since egg proteins were identified in both samples of mock-up H, the contamination might have occurred during the preparation, storage or sampling of the mock-up paint films. Nonetheless, following this result, harsher washing protocols with a steep gradient of the organic eluent have been implemented between chromatographic runs.
Supplementary Table S1 reports a summary of the proteins identified in the mock-ups. Supplementary Table S4 (in a separate Excel SI file) reports in detail the list of proteins identified in the mock-ups, including keratins identified in mock-ups I and J matching the raw data against the SwissProt database. Keratins, primarily from humans, are the most common protein contaminants in laboratory environments, but their presence can also be due to object and/or sample handling, making contamination of keratins almost unavoidable. Nonetheless, several non-human keratins were identified in the samples containing madder lake (mock-ups I and J), probably as a result of the extraction of the dye from sheep wool, as shown by the presence of peptides not matching to human proteins (Supplementary  Table S5, in a separate Excel SI file). The identified peptides also allowed for the identification of sheep (Ovis aries) as the taxonomic origin of the wool. In particular, in the large sample of mock-up I, a peptide with sequence matching only sheep and goat (Capra hircus) was identified, together with several peptides not matching the goat sequence; therefore, sheep is the only species for which the simultaneous identification of these peptides is possible. In general, for most of the wool proteins the species was identified on the basis of the exclusion of other species deemed unlikely because they do not produce wool, like red deer (Cervus elaphus hippelaphus). The identification of the species of origin of keratins was made easier by the underlying knowledge that any animal keratin identified in these samples was probably coming from the extraction of the colourant from the wool.
In the case of an unknown sample, the interpretation of results with unspecific peptides would have been more ambiguous without further evidence.
The harsh treatment used for the extraction of madder colourants from dyed wool during the preparation of the pigment (strong alkali conditions at high temperatures) is very likely to cause the partial hydrolysis of the keratins from the wool. In order to verify the presence of peptides formed during this process, the samples from mock-ups I and J were searched against a database of sheep keratin sequences, first setting the software to search for tryptic peptides, and in a second run for unspecifically-cleaved peptides. The large sample of mock-up J will be here discussed as an example of the obtained results. In both searches, no less than 6 species-specific sheep keratins were identified, highlighting the importance of the choice of database in the data analysis, as only one peptide specific for sheep/goat was found in the original Swiss-Prot search (see discussion about databases in Section 3 in the main text). In the unspecific cleavage search, the number of identified peptides was indeed higher than in the tryptic-specific search, showing that spontaneous hydrolysis had occurred, probably during the preparation of the lake pigment. However, the aim of the analysis of these samples, that is, the identification of ovicaprid and non-human keratins co-extracted with the madder lake and detectable in the paint, had already been achieved with the run against the SwissProt database. Therefore, detailed results from further searches are not reported, as in this particular case further characterisation of the keratins is not necessary. Nonetheless, readers can find the results of these MaxQuant searches in the PRIDE entry connected to this work.
Supplementary The identification of keratins from wool provides valuable information about pigment manufacture, but the presence of these proteins could be misleading when protein analysis is performed with techniques other than proteomics, or if proteomics approaches were used without knowledge of the pigments present in the sample. One of the most common techniques for the investigation of proteins in paints is GC-MS, based on the quantitative analysis of amino acids in a sample compared to the relative amounts within proteinaceous material standards. The presence of unexpected protein-based materials will give an unknown amino acidic profile, as mentioned in previous literature 5 . Colombini,et al. 5 also mentions that the presence of fungi and bacteria might interfere with the identification of the original material, since the co-extraction of proteins from these organisms is possible. Using these protocols, the unknown amino acidic profile might be forced into the profile of one of the standard materials during the statistical analysis of the GC-MS results. The analysis of paint samples with proteomics overcomes this problem by allowing the confident identification of all the proteins present, even from materials that are not expected or are considered unconventional.
The samples chosen to test the protocol were selected in order to include a range of different pigments. The presence of certain metal species derived from pigments has been shown to influence protein analysis approaches in paint samples [5][6][7] , but the influence of pigments on a proteomics protocol has never been investigated. The results obtained in the current study, evaluated in terms of number of total peptides identified in each sample, show some variability. However, this appears to be related to the sample size (and amount of protein in the sample) rather than to any influence of the specific pigment, which could not be investigated in detail since samples were not analysed in replicates and quantification was not performed. A qualitative evaluation of the results shows that the presence of the different pigments selected did not hinder the confident characterisation of the protein residues in any of the samples. Therefore, the implementation of a clean-up step, occasionally included in sample treatment protocols for protein analysis to remove pigments 6,8,9 , is not necessary and might, on the contrary, cause loss of peptides.
As discussed in Section 1.1, the protein content in each sample will differ depending on the pigment-binder ratio of the paint. This is related to the specific pigment used, how the paint was made, and the amount of sample analysed. Neither the pigment-binder ratio nor the weight of sample were measured in these experiments, and therefore the size of the samples can be considered only approximately constant throughout the "small" and "large" sample sets. In particular, both samples of mock-ups I and J were intentionally larger than the others in the respective sample sets, in order to increase the chances of detecting wool peptides. This is reflected in the high number of egg peptides identified in both samples of mock-up J (Supplementary Figure S1 and Supplementary Table S1). The pigment/binder ratio and the sample size are probably the primary causes of the variation of the number of total peptides identified in the mock-ups (Supplementary Figure S1).
Since the paint reconstructions were prepared in different years, the age of the paint might also be a factor affecting the degradation of proteins, and therefore the amount of protein extracted. However, no pattern of correlation between the age of the mock-ups and protein recovery was observed. On the contrary, the number of peptides identified in both samples of mock-up G was the second highest of each sample set, despite G being prepared up to 80 years before the other paints (see Table 1 in the main text). This might suggest that, in this case, the difference in age-related protein damage across the sample set was not a significant influence on protein recovery but systematic assessment of the impact of age within samples of a single pigment was not undertaken. It should also be noted that these mock-ups have also had a relatively short ageing period compared to Old Master paintings such as the 15 th -century painting also studied in this work. Further, these mock-ups have only had limited light exposure having remained within the laboratory environment.
Although the presence of pigments did not appear as a significant influence on protein analysis in this work, dedicated studies should be performed to assess if any metal species, and in particular those derived from pigments, influence the proteomic characterisation of protein residues extracted from artworks and paintings.

Protein damage
The calculated levels of deamidation for the mock-ups are reported in Supplementary Table  S2 and Supplementary Figure S2. Only the data relative to the large samples are reported in the graph, since the deamidation levels are comparable between the two sizes. The deamidation level was not considered reliable when less than 20 peptides could be used for the calculations, based on similar occurrences in recent literature 10 .
The level of deamidation has often been regarded as an indicator of the ageing of ancient proteins 11 , and can be used in cultural heritage studies to distinguish original proteins and modern contaminants (Ramsøe,et al. 12 and literature therein) by comparing the results from the sample with a standard material of similar composition. However, as previously observed for materials used in artworks 13 and discussed in Section 3 of the main text, a significant source of deamidation is likely to be the processing of collagen to make animal glue 14 . In the mock-up samples, this is shown by the difference in the damage of egg proteins in mock-ups A-J (average level: 6% for asparagine (N), 1% for glutamine (Q)) compared to the collagen in mock-ups E-F (average level: 56% for N, 2% for Q) (compare to Figure 2 in the main text).
Although the major cause of collagen deamidation is probably from the preparation of the glue, a limited investigation into the effects of ageing on other proteins was possible, since the reconstructions examined in this study were made at different times. As noted above, all of the samples are relatively 'young' and have had limited light exposure and the pigments present varies between the samples, making it difficult to drawn firm conclusions. That said, mock-ups G and J both contained paint bound with egg yolk, for example, but were prepared in 1933 and 2013 respectively. In addition, some animal glue proteins (collagen) from the preparation layer were also present in mock-up G, while wool proteins (keratin) derived from the lake pigment were present in mock-up J. The comparison of the deamidation levels of the different protein sources in both samples is shown in Supplementary Figure S3, which clearly indicates a higher deamidation level of the egg proteins in mock-up G (31% for N, 57% for Q) compared to the egg proteins in J (5% for N, 1% for Q). Although factors such as the effect of different pigments upon the ageing behaviour of the protein binder remain unknown, it is interesting to note that the higher deamidation level occurs in the egg proteins in mock-up G, prepared approximately 80 years before mock-up J. Furthermore, the level of deamidation of egg proteins in J is very similar to the deamidation level observed in mockups A-D (5% for N, 1% for Q), prepared in 2005. The wool keratins in mock-up J show a much higher damage level (73% for N, 19% for Q), almost certainly due to the relatively harsh extraction conditions (high temperature, alkaline) used to extract the madder colorants from the dyed wool during the preparation of the lake. The influence of this treatment is also evident in the damage level of mock-up I (60% for N, 14% for Q), in which the only noncontaminant proteins detected were wool keratins. Taken together, these results highlight the importance of understanding the source of all the different protein components found in a sample, and illustrates how factors such as processing and ageing of the materials can influence the results obtained by proteomics.

Summary of other analytical results (SEM-EDX and FTIR) for The Virgin and Child with Saint John and an Angel
Two samples (labelled IS2 and IS9) were removed from locations similar to those of 1:BP/1:GL and 2:YP, the blue cloak of the Virgin and the tallow drapery on the Angel's arm, respectively. The samples were embedded in polyester resin and prepared as crosssections. Examination was carried out with an optical microscope in ordinary light and UV light (Supplementary Figure S4). In both cases, the stratigraphy was documented, showing a white ground layer, one or more paint layers, and a varnish layer, and the major inorganic components in the paint and preparation layers were confirmed by SEM-EDX analysis (Supplementary Table S3).
Supplementary Figure S4  The three powdered samples taken for proteomic analysis (labelled 1:BP, 1:GL and 2:YP) were examined visually under the microscope and small, representative samples from each were compressed in a diamond cell and analysed by Fourier transform infrared (FTIR) microscopy (Supplementary Table S3). The results from the cross-sections and scrapings were then compared to ensure as much as possible was known about the inorganic materials in the samples before proteomics was performed.